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FIELD OF THE INVENTION 

The present invention relates to a method of detection for human Spinocerebellar ataxia 2 
gene variants, and more particularly their use in applications such as molecular diagnosis, 
prediction of an individual's disease susceptibility, and the genetic analysis of SCA2 gene 
in a population. The invention also provides primer and probe sequences useful in 
detecting these polymorphic variations in SCA2 gene and their use in diagnosis and 
prediction of an individual's susceptibility to SCA2 disease. 

BACKGROUND AND PRIOR ART 

Spinocerebellar ataxias (SCAs) are a clinically heterogeneous group of autosomal 
dominant neurodegenerative disorders characterized by progressive deterioration in 
balance and coordination. The clinical symptoms include ataxia, dysarthria, 
ophthalmoparesis, and variable degrees of motor weakness. The symptoms occur due to 
progressive neuronal loss primarily in the cerebellum but also in other parts of central 
nervous system. The symptoms usually begin during the third or fourth decade of life, 
however, juvenile onset has been identified. Typically, the disease worsens gradually, 
often resulting in complete disability and death 10-20 years after the onset of symptoms. 
Individuals with juvenile onset spinocerebellar ataxias, however, typically have more 
rapid progression of the phenotype than the late onset cases. 

Seven disease loci have been identified to date as causing this phenotype - 
Spinocerebellar ataxia 1 (SCA1) (Orr et al., Nat. Genet. 4, 221-226 (1993)), SCA2 (Pulst 
et al., Nat. Genet. 14, 269-276 (1996); Sanpei et al, Nat. Genet. 14, 227-284 (1996); 
Imbert et al., Nat. Genet. 14, 285-291 (1996)), SCA3/ MJD (Kawaguchi et al., Nat. 
Genet. 8, 221-227 (1994)), SCA6 (Zhuchenko et al, Nat. Genet. 15,62-68 (1997)), SCA7 
(David et al., Nat. Genet. 17, 65-70 (1997)), SCA8 (Koob et al., Nat. Genet. 21, 379-384 
(1999)) and SCA12 (Holmes et al., Nat. Genet. 23, 391-392 (1999)). The causative 
mutation associated with all these disease types is abnormal expansion of trinucleotide 
repeat motif in their corresponding gene. The expansion of the repeat tract beyond the 
normal range produces premutation allele that may further expand to disease producing 
mutations. 
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The genomes of all organisms undergo spontaneous mutation in the course of their 
continuing evolution generating variant forms of progenitor sequences (Gusella, Ann. 
Rev. Biochem. 55, 831-854 (1986)). The variant form may confer an evolutionary 
advantage or disadvantage relative to a progenitor form or may be neutral. In some 
instances, a variant form confers a lethal disadvantage and is not transmitted to 
subsequent generations of the organism. In other instances, a variant form confers an 
evolutionary advantage to the species, is eventually incorporated into the DNA of many 
or most members of the species, and effectively becomes the progenitor form. In many 
instances, both progenitor and variant form(s) survive and co-exist in a species 
population. The coexistence of multiple forms of a sequence gives rise to polymorphisms. 
Several different types of polymorphisms have been reported. A restriction fragment 
length polymorphism (RFLP) means a variation in DNA sequence that alters the length of 
a restriction fragment as described in Botstein et al.,Am. J. Hum. Genet. 32, 314-331 
(1980). The restriction fragment length polymorphism may create or delete a restriction 
site, thus changing the length of the restriction fragment. RFLPs have been widely used 
in human and animal genetic analyses (Donis-Keller, Cell 51, 319-337 (1987)). Other 
polymorphisms take the form of short tandem repeats (STRs) that include tandem di-, tri- 
and tetranucleotide repeated motifs. These tandem repeats are also referred to as variable 
number tandem repeat (VNTR) polymorphisms. VNTRs have been used in identity and 
paternity analysis and in a large number of genetic mapping studies. 

Other polymorphisms take the form of single nucleotide variations between individuals of 
the same species. Such polymorphisms are far more frequent than RFLPS, STRs and 
VNTRs. Some single nucleotide polymorphisms (SNPs) occur in protein-coding 
sequences, in which case, one of the polymorphic forms may give rise to the expression 
of a defective or other variant protein and, potentially, a genetic disease. Examples of 
genes, in which polymorphisms within coding sequences give rise to genetic disease 
include beta.-g!obin (sickle cell anemia) and CFTR (cystic fibrosis). Other single 
nucleotide polymorphisms occur in non-coding regions. Some of these polymorphisms 
may also result in defective protein expression (e.g., as a result of defective splicing). 
Other single nucleotide polymorphisms have no phenotypic effects. 



SKPs can be used' in the same manner as RFLPs, and VNTRs but offer several 
advantages. SNPs occur with greater frequency and are spaced more uniformly 
throughout the genome than other forms of polymorphism. The greater frequency and 
uniformity of SNPs means that there is a greater probability that such a polymorphism 
will be found in close proximity to a genetic locus of interest than would be the case for 
other polymorphisms. Also, the different forms of characterized SNPs are often easier to 
distinguish that other types of polymorphism (e.g., by use of assays employing allele- 
specific hybridization probes or primers). 

Spinocerebellar ataxia 2 (SCA2), which was initially described in a Cuban population 
(Gispert et al., Nat. Genet. 4, 295-299 (1993)), has now been reported worldwide. The 
human SCA2 gene has 25 exons and encompasses approximately 130kb on 12q23-24.1 
region of chromosome 12 (Sahba et al., Genomics 47, 359-364 (1998)). The molecular 
basis of the disease is an expansion of a C AG repeat tract in exon 1 of SCA2 gene. The 
molecular diagnosis of clinically suspected SCA2 patients is carried out by the correct 
sizing of the CAG repeats at the SCA2 locus. In normal individuals this CAG repeat is 
not only polymorphic in length, ranging from 14-31 repeats with a mode of 22 repeats, 
but also cryptic in nature, having one or more interrupting CAA triplets. In contrast, the 
SCA2 disease alleles contain a pure, contiguous stretch of 34 - 59 CAG repeats. 
Sanpei and Tsuji (patent CA2241173, EPO0878543 and WO 98/18920) have provided 
the cDNA fragments of the gene causative of spinocerebellar ataxia type 2 having a 
determined base sequence. Pulst and Ramos in patent WO 97/42314 have also provided 
the isolated nucleic acids encoding human SCA2 protein or fragments thereof and a 
method of diagnosis of SCA2 disease. 

Tsuji and Sanpei have also patented a method for specifically diagnosing SCA2 (patents 
CA2232311, EP0869186 and WO 98/03679). Therein the method comprises effecting 
PCR by employing DNA to be tested as template and using nucleic acid primers 
hybridizable with the parts of the base sequences of the SCA2 gene. The diagnosis 
depends on the number of the CAG repeat units in the SCA2 gene, the patient with SCA2 
has the number of CAG repeat units of 35 or above while the gene of a normal subject 
has 1 5 to 24 repeats, which enables the diagnosis of SCA2. 



However, these methods are not useful for detecting normal individuals carrying repeats 
predisposed to instability and expansion (premutation alleles) as the repeat length alone 
would not be the correct predictor of repeat instability at SCA2 locus due to presence of 
varying number of CAA interruptions. The presence of interruptions within the triplet 
repeats has been shown to play an important role in determining stability to a number of 
trinucleotide repeat disorders (Chung et al., Nat. Genet. 5, 254-258 (1993); Kunst et al., 
Cell 77, 853-861 (1994); Eichler et al., Nat. Genet. 8, 88-94 (1994)). It has been proposed 
that the presence of these interruptions confers stability and their absence predisposes 
alleles to instability and eventual disease status. 

The prior art is lacking in any method that associates the allelic variants of SCA2 gene to 
the disease susceptibility. The prior art is also lacking in any study that correlates the 
substructure of SCA2 CAG repeat with repeat instability and predisposition to the SCA2 
disease. This is the first demonstration that relates to the detection of single nucleotide 
polymorphisms in human SCA2 gene and their use for applications such as molecular 
diagnosis, prediction of an individual's SCA2 disease susceptibility or otherwise, and/or 
the genetic analysis of SCA2 gene in a population. The novelty of present invention is in 
providing a method for detecting allelic variants of SCA2 gene within the human 
population and their association with the disease for prediction of an individual's 
predisposition to SCA2. 

OBJECTS OF THE INVENTION 

The main object of the present invention is to provide method of detection of allelic 
variants of human SCA2 gene. 

Another object is to provide allele specific primers and probes useful for detection of 
allelic variants of human SCA2 gene. 

Yet another object of the invention is to provide a method for establishing association of 
SCA2 allelic variants with disease susceptibility. 

Still another object of the invention is to provide a method for screening individuals 
carrying SCA2 alleles predisposed to instability and expansion. 
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SUMMARY OF THE INVENTION 

The present invention relates to allelic variants of human Spinocerebellar ataxia 2 
(SCA2) gene and provides allele-specific primers and probes suitable for detecting these 
allelic variants for applications such as molecular diagnosis, prediction of an individual's 
disease susceptibility, and/or the genetic analysis of SCA2 gene in a population. 

DETAILED DESCRIPTION OF THE INVENTION 

The present invention relates to the detection of the allelic variants of the human SCA2 
gene and their utility in predicting an individual's susceptibility to the SCA2 disease. 

Accordingly, the present invention provides method detection of human Spinocerebellar 
ataxia 2 gene variants, said method comprising the steps of : 

1. designing and synthesizing oligonucleotide primers for PCR amplification of CAG 
repeat containing region of exon 1 of human SCA2 gene, 

2. amplifying genomic DNA of SCA2 patients and normal control individuals using 
the above said primers, 

3. sequencing the amplified PCR product and identifying sequence variations 
computationally by comparing it with the already existing sequence of human SCA2 
gene, 

4. screening normal control individuals and SCA2 patients for novel single nucleotide 
polymorphisms using allele specific oligonucleotide probes, 

5. computing the frequencies of CC and GT haplotypes in normals and SCA2 patients, 

6. establishing the association of the CC and GT haplotype with the SCA2 disease 
based on their frequency distribution in normals and SCA2 patients, 

7. predicting the risk or susceptibility to the SCA 2 disease based on the haplotype 
present at the polymorphic sites in the individual tested, GT haplotype being at low 
risk and CC haplotype at high risk to the disease. 

In an embodiment, the primers suitable for amplification of the SCA2 gene region 
containing one or more polymorphic sites, are selected from the group consisting of SEQ 
ID NO: 1, SEQ ID NO: 2, SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO:5, SEQ ID NO: 6, 
SEQ ID NO: 7, SEQ ID NO: 8 and compliments thereof. 
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In another embodiment, the allele specific oligonucleotide probes useful for detection of 
SCA2 gene variants are selected from the group consisting of SEQ ID NO: 9, SEQ ID 
NO: 10, SEQ ED NO: 11, SEQ ED NO: 12 and the compliments thereof, wherein the 
polymorphic site occupies a central position of the probe. 

In yet another embodiment, the length of the oligonucleotide primers and probes is in the 
range of 5 to 100 bases. 

In still another embodiment, allelic variants of SCA2 gene have GT and CC haplotypes, 
Further, the invention provides a diagnostic kit for the detection of SNP haplotypes 
(CC/GT) comprising suitable primers and probes selected from polynucleotide sequences 
under SEQ ED NO: 1 to 12. 

In another embodiment of the invention a nucleic acid vector may contain the allelic 
variants of SCA2 gene. 

In an embodiment of the invention primers suitable for amplification of SCA2 gene 
region containing one or more polymorphic sites are provided, said primers selected from 
the group comprising : 

a) CTC CGC CTC AGA CTG TTT TGG TAG 3' (as listed in SEQ ID NO: 1); and 

b) GTG GCC GAG GAC GAG GAG AC 3 ' (as listed in SEQ ED NO: 2) and 
compliments thereof. 

In yet another embodiment of the invention allele specific primers suitable for detection 
of allelic variants of SCA2 gene are provided, selected from the group comprising: 

a) 5' CTC GGC GGG CCT CCC CGC CCC TTC GTC GTC C 3' (as listed in SEQ 
ID NO: 3); 

b) 5' CTC GGC GGG CCT CCC CGC CCC TTC GTC GTC G 3' (as listed in SEQ 
ED NO: 4); 

c) 5' CCT CCC CGC CCC TTC GTC GTC 3' (as listed in SEQ ED NO: 5); 

d) 5' CGC CAA CCC GCG CCT CCC CGC TCG GCG CCC GC 3' (as listed in 
SEQ ID NO: 6); 
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e) 5' CGC CAA CCC GCG CCT CCC CGC TCG GCG CCC GT 3' (as listed in 
SEQ ID NO: 7); and 

f) 5' GCG CCT CCC CGC TCG GCG CCC G 3' (as listed in SEQ ID NO: 8) and 
compliments thereof. 

In still another embodiment of the invention allele specific probes useful for detection of 
SCA2 gene variants wherein the polymorphic site occupies a central position of the probe 
are provided, said allele specific probes selected from the group comprising: 

a) 5' CCC CTT CGT CGT CCT CCT TCT CCC CCT 3' (as listed in SEQ ID NO: 

9) ; 

b) 5' CCC CTT CGT CGT CGT CCT TCT CCC CCT 3' (as listed in SEQ ID NO: 

10) ; 

c) 5' CGC TCG GCG CCC GCG CGT CCC CGC CGC 3' (as listed in SEQ ID NO: 

11) ; and 

d) 5' CGC TCG GCG CCC GTG CGT CCC CGC CGC 3' (as listed in SEQ ID NO: 

12) are compliments thereof. 

The allelic variants of human SCA2 gene may comprise one or more of the following 
single nucleotide polymorphisms as compared with the human SCA2 complete cDNA 
sequence in the data base (GenBank accession number U70323). 



Table 1 





Site of change 


Base change 


Amino-acid alteration 


(A) 


481 


G-C 


Val - Leu 


(B) 


552 


T-C 


Arg - Arg 



The sites of change is in accordance with the human SCA2 complete cDNA sequence in 
the database (GenBank accession number U70323). 
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The invention also provides a method of analysing a nucleic acid from an individual for 
the presence of base at any one of the polymorphic sites shown in Table 1. This type of 
analysis can be performed on a plurality of individuals who are tested either for the 
presence or for the predisposition to the SCA2 disease. The susceptibility to the disease 
can then be established based depending on the base or set of bases present at the 
polymorphic sites in the individuals tested. 

The invention also provides oligonucleotide sequences (as listed in SEQ ID NO: 1 to 12), 
suitable for use as allele specific primers and probes for the detection of polymorphic 
sites listed in Table 1. 

Further, a diagnostic kit comprising one or more of the allele specific primers or probes 
along with the required buffers and accessories suitable for identification of SCA2 allelic 
variants to establish an individual's susceptibility to SCA2 disease is also included in the 
invention. 

Eucaryotic expressing vectors comprising a DNA sequence coding for a. protein or a 
peptide according to the invention are new materials and are also included in the 
invention. Host cells, for example, cloned human cell lines, can be transformed using the 
new expression vectors and are also included in the invention. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The manner in which the above-mentioned features, advantages and objects of the 
invention, as well as others which will become clear, are attained and can be understood 
in detail, by the particular description of the invention are illustrated in the appended 
drawings. These drawings form a part of the specification. It is to be noted, however, that 
the appended drawings illustrate preferred embodiments of the invention and thereof not 
to be considered limiting in their scope. 

In the drawing(s) accompanying this specification: 

Figure 1 is a schematic representation of the two novel single nucleotide 
polymorphisms in SC A2 gene. The top line depicts the position of the 25 exons of the 
SCA2 gene. The second line shows the relative locations of the two polymorphic sites 
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and the CAG repeat tract in exon 1 of SCA2 gene. Both the polymorphisms are also 
shown in sequence context below the gene. 



Figure 2 shows the distribution of CAA triplets in SCA2 CAG repeat tract of 215 
normal chromosomes. Open circles represent CAG triplets and dark circles represent 
CAA triplets. Alleles are grouped by GT or CC haploypes and are arranged in the 
ascending order of the repeat length. 

Figure 3 shows the frequency distribution of CAA interruptions in normal SCA2 
chromosomes with GT (open bar) and CC haplotype (filled bars). Frequencies on Y-axis 
are the percentage of 152 alleles with GT haplotype or 63 alleles with CC haplotype. 

Figure 4 shows the details of the SNP with reference ID 695871 submitted by the 
applicants in the SNP database f http://www.re hi nl m.nih. eov/SNPA. 

Figure 5 shows the details of the SNP with reference ID 695872 submitted by the 
applicants in the SNP database ( http://www. ncbi.nlm.nih.gov/SNP/). 

Figure 6 shows the complete CDNA sequence of the human SCA2 MRNA submitted 
by pulst, S-M in the Genbank database f http://ww.ncbi.nlm.nih. gov/SNP/). 

Other and further aspects, features, and advantages of the present invention will be 
apparent from the following description of the preferred embodiments of the invention 
given for the purpose of disclosure. Alternative embodiments of the invention can be 
envisaged by those skilled in the art. All such alternative embodiments are intended to lie 
within the scope of this invention. 

I. Novel Polymorphisms of the Invention 

As a first step to the present invention, the applicants carried out the PCR amplification 
of CAG repeat containing region of exon 1 of the human SCA2 gene using new 
oligonucleotide primers. These primers were designed in accordance with the. human 
SCA2 complete cDNA sequence submitted by Pulst, S. -M. in the data base (GenBank 
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accession number U70323). The sequencing of the purified PCR product revealed two 
novel single nucleotide polymorphisms (SNPs) in exon 1 of human SCA2 gene. It was 
apparent, therefore, that there is a hitherto unrecognized allele or subtype of the human 
SCA2 gene. 

The present invention provides a sequence for the allelic variants of human 
spinocerebellar ataxia 2 (SCA2) gene comprising one or more of the following single 
nucleotide polymorphisms compared with the human SCA2 complete cDNA sequence in 
the data base (GenBank accession number U70323). 



Table 1 





Site of change 


Base change 


Amino-acid alteration 


(A) 


481 


G-C 


Val - Leu 


(B) 


552 


T-C 


Arg - Arg 



The sites of changes are in accordance with the human SCA2 complete cDNA sequence 
in the database (GenBank accession number U70323). 

(The applicants have already submitted these two SNPs in the SNP database 
(http://www.ncbi.nlm.nih.gov/SNP/) on August 2, 2000. The first SNP at position 481 
and having either a G or a C base have a reference SNP ID 695871. The reference SNP 
ID for the second SNP at position 552 and with T or a C base is 695872). 

The first polymorphic site (A), as shown in figure 1, had either a G or a C base and is 
177bp upstream of the polymorphic SCA2 CAG repeat stretch. The second polymorphic 
site (B) is situated 106bp upstream of the CAG repeat tract and contains either a T or a C 
base. While the first substitution changes the amino acid sequence from valine to leucine, 
the second substitution is neutral. 

For example, the nucleotide sequence of the allelic variant of human SCA2 gene having 
polymorphic sites as listed in Table 1 may be- 

5' C TCC GCC TCA GAC TGT TTT GGT AGC AAC GGC AAC GGC GGC GGC 
GCG TTT CGG CCC GGC TCC CGG CGG CTC CTT GGT CTC GGC GGG CCT 
CCC CGC CCC TTC GTC GTC CTC CTT CTC CCC CTC GCC AGC CCG GGC GCC 
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CCT CCG GCC GCG CCA ACC CGC GCC TCC CCG CTC GGC GCC CGC GCG 
TCC CCG CCG CGT TCC GGC GTC TCC TTG GCG CGC CCG GCT CCC GGC 
TGT CCC CGC CCG GCG TGC GAG CCG GTG TAT GGG CCC CTC ACC ATG 
TCG CTG AAG CCC CAG CAG CAG CAG CAG CAG CAG CAG CAA CAG CAG 
CAG CAG CAA CAG CAG CAG CAG CAG CAG CAG CAG CCG CCG CCC GCG 
GCT GCC AAT GTC CGC AAG CCC GGC GGC AGC GGC CTT CTA GCG TCG 
CCC GCC GCC GCG CCT TCG CCG TCC TCG TCC TCG GTC TCC TCG TCC TCG 
GCC AC 3' 

In the above sequence the SNPs (A) and (B) are at nucleotide position 107 and 178 
respectively and are shown in bold. 

II. Association Analysis with the Disease 

Analysis of these two SNPs in 215 normal and 50 expanded SCA2 chromosomes 
revealed that although four haplotypes are possible with two bialleiic polymorphic 
systems, only two were observed, GT or CC haplotype. No GC or CT allele was detected 
in our sample set suggesting that either these alleles are very rare or G, T and C, C are 
exclusively linked to each other. The frequency of each SNP in normal and expanded 
SCA2 chromosomes is summarized in Table 2. 



Table 2 



CAG repeat size 


No. of 
chromosomes 
studied (n) 


Percentage 
GT haplotype (n) 


Percentage 
CC haplotype (n) 


Normal 

(18-31 repeats) 


215 


70.7% (152) 


29.3% (63) 


Expanded 

(> 32 repeats) 


50 


0.0% (0) 


100% (50) 



In 215 normal chromosomes tested, the GT and the CC haplotype was represented in 
70.7% and 29.3% respectively. Further studies on expanded chromosomes revealed a 
highly significant (% 2 = 76.589, p< 0.0000) difference in the distribution of the two SNPs 
between the normal and the expanded SCA2 chromosomes (Table 1). All the SCA2 
chromosomes (n = 50) segregated with CC allele, showing that the disease chromosomes 
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are in complete association with the CC haplotype. In order to establish the molecular 
basis for the susceptibility of CC alleles for SCA2 expansion mutation, we performed the 
CAA interspersion analysis of SCA2 CAG repeat stretch for chromosomes with GT and 
CC haplotype. 

Among the total of 215 control chromosomes analysed for CAA interspersion pattern, 
1.8% (4/ 215) contained none, 20.9% (53/ 215) had one, 76.7% (157/ 215) had two and 
0.5% (1/ 215) had three CAA interruptions (Figure 2). A marked split was observed in 
the number and the pattern of CAA interruptions in the alleles with GT and CC 
haplotype. 98% (149/ 152) of the chromosomes with GT alleles had two or more CAA 
interruptions while 86% (54/ 63) of the CC alleles had either one or were devoid of 
interruption. This difference in the number of interruptions present on GT and CC alleles 
as shown graphically in figure 3, is quite significant. The first 5' CAA interruption was 
observed at the triplet position 9 and the second at position 14 in 97.4% (148/ 152) of the 
GT alleles. In contrast, 73% (46/ 63) of the CC alleles had their first 5' interruption at 
position 14 suggesting that absence of the most proximal 5' CAA interruption. Again a 
significant difference in the position of the first CAA interruption was observed between 
the two SNP haplotypes. 

When similar length normal chromosomes with GT and the CC haplotypes were 
compared by CAA interspersion pattern, the CC alleles were found to have less number 
of interruptions than the GT alleles. And this has resulted in a concomitant increase in 
pure CAG repeat length in chromosomes with CC haplotype. Similarly for 215 randomly 
selected normal chromosomes (Figure 2), the average length of the longest uninterrupted 
CAG repeat tract was significantly larger (one tailed t test, p = 0.0000) in CC alleles 
(13.3 repeats) as compared to GT alleles (8.03 repeats). 

It has been proposed that a minimal length of pure repeats is required to initiate instability 
at a repeat locus. The presence of interruptions breaks the repeats into smaller repeat 
tracts and thus protects the repeat from instability by reducing the length of continuous 
uninterrupted repeats. There are evidences in case of SCA1 and fragile X syndrome that 
larger uninterrupted repeats are more likely to expand than cryptic repeats. This is also 
true for dinucleotide repeats where the degree of polymorphism for a repeat locus is 
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generally proportional to the length of the perfect repeat. Since 98% of the normal 
chromosomes with GT haplotype have two or more CAA interruptions while majority of 
the alleles with a single or no CAA interruptions are found to be associated with CC 
haplotype (Figure 3), suggests that absence of CAA interruptions between the CAG 
repeat tract is one of the factors contributing to repeat instability and facilitating repeat 
expansion in chromosomes with CC haplotype. This is further supported by the 
observation that the average length of the longest uninterrupted repeat tract is much 
longer in CC alleles (13.3 repeats) compared to GT alleles (8.03 repeats). The length of 
repeat variability also reduced with an increase in over all number of interruptions. For 
example, the length of the uninterrupted CAG repeat tract in alleles with one interruption 
and CC haplotype extends from 5 - 22, whereas for alleles with two or more CAA 
interruption and the GT haplotype, the range is 8 - 13 pure CAG repeats. 

Therefore, haplotype analysis carried out using two novel SNPs suggested that both the 
CAG repeat length and its substructure are important parameters in the assessment of 
stability of SC A2 repeat alleles. The presence of CAA interruptions at SCA2 locus play 
an important role in determining stability to CAG repeats and their absences predisposes 
alleles to expansion and eventually to disease status. A complete association of CC 
haplotype with SCA2 expanded chromosomes and the presence of only one or no 
interrupting CAA triplet in control chromosomes with CC haplotype indicates that this 
novel allelic variant of SCA2 allele is predisposed to expansion. In other words, the 
absence of GT haplotype in expanded chromosomes suggests that the GT alleles are at 
nearly zero risk for SCA2 disease. Therefore, these SNP haplotypes in the human SCA2 
gene could be used as a method of establishing individual risk to SCA2. Moreover, the 
presence of these two novel SNPs in very close proximity to the SCA2 repeat region also 
makes them very useful genetic markers in studying the origin and the evolution of SCA2 
expansion mutation. The association of the CC/ GT haplotypes with the SCA2 disease 
was studies in an Indian population. However similar association, i.e., GT haplotype 
being at low risk and CC being at high risk for SCA2 disease, can be expected to hold 
true for other human populations also. 
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HI. Diagnostic Kits 

The invention further provides diagnostic kit comprising at least one or more allele- 
specific oligonucleotide as described in SEQ ID 1 to 12. Often, the kits contain one or 
more pairs of allele-specific oligonucleotides hybridizing to different forms of a 
polymorphism. In some kits, the allele-specific oligonucleotides are provided 
immobilized to a substrate. For example, the same substrate can comprise allele-specific 
oligonucleotide probes for detecting at least one or all of the polymorphisms shown in 
Table 1. Optional additional components of the kit include, for example, restriction 
enzymes, reverse-transcriptase or polymerase, the substrate nucleoside triphosphates, 
means used to label (for example, an avidinenzyme conjugate and enzyme substrate and 
chromogen if the label is biotin), and the appropriate buffers for reverse transcription, 
PCR, or hybridization reactions. Usually, the kit also contains instructions for carrying 
out the methods. 

IV. Nucleic acid Vectors 

Variant genes can be expressed in an expression vector in which a variant gene is 
operably linked to a native or other promoter. Usually, the promoter is a eukaryotic 
promoter for expression in a mammalian cell. The transcription regulation sequences 
typically include a heterologous promoter and optionally an enhancer, which is 
recognized by the host. The selection of an appropriate promoter, for example trp, lac, 
phage promoters, glycolytic enzyme promoters and tRNA promoters, depends on the host 
selected. Commercially available expression vectors can also be used. Suitable host cells 
include bacteria such as E. coli, yeast, filamentous fungi, insect cells, mammalian cells, 
typically immortalized, e.g., mouse, CHO, human and monkey cell lines and derivatives 
thereof. Preferred host cells are able to process the variant gene product to produce an 
appropriate mature polypeptide. 

The invention further provides transgenic non-human animals capable of expressing an 
exogenous variant gene and/or having one or both alleles of an endogenous variant gene 
inactivated. Expression of an exogenous variant gene is usually achieved by operably 
linking the gene to a promoter and optionally an enhancer, and microinjecting the 
construct into a zygote. Inactivation of endogenous variant genes can be achieved by 
forming a transgene in which a cloned variant gene is inactivated by insertion of a 
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positive selection marker. The transgene is then introduced into an embryonic stem cell, 
where it undergoes homologous recombination with an endogenous variant gene. Mice 
and other rodents are preferred animals. Such animals provide useful drug screening 
systems. 

The invention is illustrated by the following diagrams wherein : 

The following examples are given by way of illustration of the present invention and 
should construed to limit the scope of the present invention. 

EXAMPLE 1 

Identification of allelic variants of SCA2 gene: 

This example describes the identification of allelic variants of human Spinocerebellar 
ataxia 2 gene by PCR and sequencing using certain oligonucleotide primers according to 
the invention. DNA was extracted from human peripheral blood leukocytes using a 
modification of the salting out procedure. The concentration of the DNA was determined 
by measuring the optical density of the sample, at a wavelength of 260 nm. The DNA 
was then amplified by polymerase chain reaction by using the oligonucleotide primers: 

1. 5' CTC CGC CTC AGA CTG TTT TGG TAG 3' (as listed in SEQ ID NO: 1) and 

2. 5' GTG GCC GAG GAC GAG GAG AC 3 ' (as listed in SEQ ID NO: 2). 

The samples were denatured at 94°C for 3 min followed by 35 cycles of denaturartion 
94°C, 45sec), annealing (52°C, 30sec), extension (72°C, 45 sec) and a final extension of 7 
min at 72°C in a Perkin Elmer GeneAmp PCR System 9600. This reaction produced a 
DNA fragment of 459bp when analysed by genescan analysis using ABI prism 377 
automated DNA sequencer (459bp product had 22 repeats at polymorphic CAG repeat 
region). The PCR product was purified from band cut out of agarose gel using a 
QIAquick gel extraction kit (Qiagen) and both the strands of the PCR product were 
directly sequenced using dye terminator chemistry on an ABI Prism 377 automated DNA 
sequencer with the PCR primers. The PCR product was shown to be identical to the 
human ataxin-2 (SCA2) mRNA, complete cds sequence in the data base (accession 
number U70323), submitted by Pulst, S. -M., except for the previously mentioned two 
single base changes as listed in table 1. 
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EXAMPLE 2 

Nucleotide sequence of the Allelic Variant of SCA2 gene: 

The nucleotide sequence of the allelic variant of SCA2 gene derived using the method as 
described in example 1- 

5' C TCC GCC TCA GAC TGT TTT GGT AGC AAC GGC AAC GGC GGC GGC 
GCG TTT CGG CCC GGC TCC CGG CGG CTC CTT GGT CTC GGC GGG CCT 
CCC CGC CCC TTC GTC GTC CTC CTT CTC CCC CTC GCC AGC CCG GGC GCC 
CCT CCG GCC GCG CCA ACC CGC GCC TCC CCG CTC GGC GCC CGC GCG 
TCC CCG CCG CGT TCC GGC GTC TCC TTG GCG CGC CCG GCT CCC GGC 
TGT CCC CGC CCG GCG TGC GAG CCG GTG TAT GGG CCC CTC ACC ATG 
TCG CTG AAG CCC CAG CAG CAG CAG CAG CAG CAG CAG CAA CAG CAG 
CAG CAG CAA CAG CAG CAG CAG CAG CAG CAG CAG CCG CCG CCC GCG 
GCT GCC AAT GTC CGC AAG CCC GGC GGC AGC GGC CTT CTA GCG TCG 
CCC GCC GCC GCG CCT TCG CCG TCC TCG TCC TCG GTC TCC TCG TCC TCG 
GCC AC 3' 

In the above sequence the two SNPs as given in Table 1 are at nucleotide position 107 
and 178 respectively and are shown in bold. 

EXAMPLE 3 

GT alleles are at nearly zero risk for SCA2 diseases: 

A method as described in example 1 is applied to a series of DNA samples extracted from 
Spinocerebellar ataxia 2 positive individuals and normal controls. There is observed a 
statistically significant difference (p < 0.0000) in the frequency distributions of the SNP 
haplotypes generated using the single nucleotide polymorphisms in normal and expanded 
SCA2 chromosome. The results obtained are summarized in the table below: 



Diagnosis 


SCA2 haplotype 




GT 


CC 


Control Individuals 


70.7% 


29.3% 


Spinocerebellar ataxia 2 
Patients 


0.0% 


100.0% 



16 



A complete association of CC haplotype with SCA2 disease chromosomes indicates that 
SCA2 alleles with the CC haplotype are predisposed to expansion. In other words, the 
absence of GT haplotype in expanded chromosomes indicates that GT alleles are at 
nearly zero risk for SCA2 disease. Therefore, these SNP haplotypes in the human 
Spinocerebellar ataxia 2 gene could be used as a method of establishing individual risk to 
Spinocerebellar ataxia 2. The association of the CC/GT haplotypes with the SCA2 
disease was studies in an Indian population. However similar association, i.e., GT 
haplotype being at low risk and CC being at high risk for SCA2 disease, can be expected 
to hold true for other human populations also. 

EXAMPLE 4 

Allele specific primers used for the detection of the allelic variants of SGA2 gene: 

1. 5' CTC GGC GGG CCT CCC CGC CCC TTC GTC GTC C 3' (as listed in SEQ ID 
NO: 3) 

2. 5' CTC GGC GGG CCT CCC CGC CCC TTC GTC GTC G 3' (as listed in SEQ ID 
NO: 4) 

3. 5' CCT CCC CGC CCC TTC GTC GTC 3' (as listed in SEQ ID NO: 5) 

4. 5' CGC CAA CCC GCG CCT CCC CGC TCG GCG CCC GC 3' (as listed in SEQ 
ID NO: 6) 

5. 5' CGC CAA CCC GCG CCT CCC CGC TCG GCG CCC GT 3' (as listed in SEQ 
ED NO: 7) 

6. 5' GCG CCT CCC CGC TCG GCG CCC G 3' (as listed in SEQ ED NO: 8) 
EXAMPLE 5: 

Allele specific oligonucleotide probes used for detection of the SCA2 gene variants: 

1 . 5' CCC CTT CGT CGT CCT CCT TCT CCC CCT 3 ' (as listed in SEQ ED NO: 9) 

2. 5' CCC CTT CGT CGT CGT CCT TCT CCC CCT 3' (as listed in SEQ ED NO: 10) 

3. 5' CGC TCG GCG CCC GCG CGT CCC CGC CGC 3' (as listed in SEQ ID NO: 1 1) 

4. 5' CGC TCG GCG CCC GTG CGT CCC CGC CGC 3' (as listed in SEQ ED NO: 12) 
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EXAMPLE 6: 

Nucleic acid vectors containing the SCA2 variant sequences: 

Expression vectors and host cell transformed with the allelic variant of SCA2 gene 
containing one or more polymorphic sites as listed in table 1, can be prepared, for 
example, as detailed below. 

Allelic variant of SCA2 gene can be expressed in an expression vector in which the 
variant gene is operably linked to a native or other promoter. Usually, the promoter is a 
eukaryotic promoter for expression in a mammalian cell. The transcription regulation 
sequences typically include a heterologous promoter and optionally an enhancer, which is 
recognized by the host. The selection of an appropriate promoter, for example trp, lac, 
phage promoters, glycolytic enzyme promoters and tRNA promoters will depend on the 
host selected. Commercially available expression vectors can also be used. 

The means of introducing the expression construct into a host cell varies will depend 
upon the particular construction and the target host. Suitable means include fusion, 
conjugation, transfection, transduction, electroporation or injection. A wide variety of 
host cells can be employed for expression of the variant gene, both prokaryotic and 
eukaryotic. Suitable host cells include bacteria such as E. coli, yeast, filamentous fungi, 
insect cells, mammalian cells, typically immortalized, e.g., mouse, CHO, human and 
monkey cell lines and derivatives thereof. Preferred host cells are able to process the 
variant gene product to produce an appropriate mature polypeptide. 

ADVANTAGES: 

The invention shall be useful to establish genotype or base variations of SCA 2 gene. The 
information may be useful for molecular diagnosis, prediction of an individual's disease 
susceptibility to SCA2, prognosis and/or the genetic analysis of SCA2 gene in a 
population. The frequency of these variants can also be used to predict the prevalence of 
SCA 2 disease among various populations. 
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CLAIMS 

1. A method of detection of human Spinocerebellar ataxia 2 gene variants and the said 
method comprises: 

a) designing and synthesizing oligonucleotide primers for PCR amplification of 
CAG repeat containing region of exon 1 of human SCA2 gene, 

b) amplifying genomic DNA of SCA2 patients and normal control individuals 
using the said primers of step (a), 

c) sequencing the amplified PCR product and identifying the sequence variations 
computationally by comparing it with the already existing sequence of human 
SCA2 gene, 

d) screening normal control individuals and SCA2 patients for novel single 
nucleotide polymorphisms using allele specific oligonucleotide probes, 

e) computing the frequencies of CC and GT haplotypes in normals and SCA2 
patients, 

f) establishing the association of the CC and GT haplotype with the SCA2 disease 
based on their frequency distribution in normals and SCA2 patients, and 

g) predicting the risk or susceptibility to the SCA2 disease based on the haplotype 
present at the polymorphic sites in the individual tested, GT haplotype being at 
low risk and CC haplotype at high risk to the disease. 

2. A method as claimed in claim 1 wherein, the primers suitable for amplification of the 
SCA2 gene region containing one or more polymorphic sites, are selected from the 
group. 

a) CTC CGC CTC AGA CTG TTT TGG TAG 3 ' (as listed in SEQ ED' NO: 1). 

b) GTG GCC GAG GAC GAG GAG AC 3' (as listed in SEQ ID NO: 2) and 
compliments thereof. 

3. A method as claimed in claim 1 wherein, the allele specific primers suitable for 
detection of the allelic variants of SCA2 gene are selected from: 

a) 5' CTC GGC GGG CCT CCC CGC CCC TTC GTC GTC C 3' (as listed in SEQ 
ID NO: 3); 

b) 5' CTC GGC GGG CCT CCC CGC CCC TTC GTC GTC G 3 ' (as listed in SEQ 
ID NO: 4); 
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c) 5' CCT CCCCGC CCC TTC GTC GTC 3' (as listed in SEQ ID NO: 5); 

d) 5' CGC CAA CCC GCG CCT CCC CGC TCG GCG CCC GC 3' (as; listed in 
SEQ ID NO: 6); 

e) 5' CGC CAA CCC GCG CCT CCC CGC TCG GCG CCC GT 3' (as listed in 
SEQ ID NO: 7); and 

f) 5' GCG CCT CCC CGC TCG GCG CCC G 3' (as listed in SEQ ID NO: 8) and 
compliments thereof. 

4. A method as claimed in claim 1 wherein, the allele specific probes useful for 
detection of SCA2 gene variants, wherein the polymorphic site occupies a central 
position of the probe are selected from: 

a) 5' CCC CTT CGT CGT CCT CCT TCT CCC CCT 3' (as listed in SEQ ID NO: 

9) ; 

b) 5' CCC CTT CGT CGT CGT CCT TCT CCC CCT 3' (as listed in SEQ ID NO: 

10) ; 

c) 5' CGC TCG GCG CCC GCG CGT CCC CGC CGC 3' (as listed in SEQ ID NO: 

11) ; and 

d) 5' CGC TCG GCG CCC GTG CGT CCC CGC CGC 3' (as listed in SEQ ID NO: 

12) are compliments thereof. 

5. A method as claimed in claim 1 wherein, the length of the oligonucleotide primers 
and probes of claims 2, 3 and 4 is in the range of 5 to 100 bases. 

6. A diagnostic kit for the detection of SNP haplotypes (CC/GT) comprising suitable 
primers and probes are selected from the group consisting of sequences given under 
SEQ ID NO: 1 to 12. 

7. Primers suitable for amplification of SCA2 gene region containing one or more 
polymorphic sites, said primers selected from the group comprising : 

a) CTC CGC CTC AGA CTG TTT TGG TAG 3' (as listed in SEQ ID NO: 1); and 

b) GTG GCC GAG GAC GAG GAG AC 3 ' (as listed in SEQ ID NO: 2) and 
compliments thereof. 

8. Allele specific primers suitable for detection of allelic variants of SCA2 gene, 
selected from the group comprising: 
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a) 5' CTC GGC GGG CCT CCC CGC CCC TTC GTC GTC C 3' (as listed in SEQ 
ID NO. 3); 

b) 5' CTC GGC GGG CCT CCC CGC CCC TTC GTC GTC G 3' (as listed in SEQ 
ID NO: 4); 

c) 5' CCT CCC CGC CCC TTC GTC GTC 3' (as listed in SEQ ID NO: 5); 

d) 5' CGC CAA CCC GCG CCT CCC CGC TCG GCG CCC GC 3' (as listed in 
SEQ ID NO: 6); 

e) 5' CGC CAA CCC GCG CCT CCC CGC TCG GCG CCC GT 3' (as listed in 
SEQ ID NO: 7); and 

£) 5 ' GCG CCT CCC CGC TCG GCG CCC G 3 ' (as listed in SEQ ID NO: 8) and 
compliments thereof. 

9. Allele specific probes useful for detection of SCA2 gene variants wherein the 
polymorphic site occupies a central position of the probe, said allele specific probes 
selected from the group comprising: 

a) 5' CCC CTT CGT CGT CCT CCT TCT CCC CCT 3 ' (as listed in SEQ ID NO: 

9) ; 

b) 5' CCC CTT CGT CGT CGT CCT TCT CCC CCT 3' (as listed in SEQ ID NO: 

10) ; 

c) 5' CGC TCG GCG CCC GCG CGT CCC CGC CGC 3' (as listed in SEQ ED NO: 

11) ; and 

d) 5' CGC TCG GCG CCC GTG CGT CCC CGC CGC 3' (as listed in SEQ ED NO: 

12) are compliments thereof. 
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ABSTRACT 



The present invention relates to allelic variants of human Spinocerebellar ataxia 2 
(SCA2) gene and provides allele- specific primers and probes suitable for detecting these 
allelic variants for applications such as molecular diagnosis, prediction of an individual's 
disease susceptibility, and/or the genetic analysis of SCA2 gene in a population. 
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GCAGCAGCAG CAGCCGCCGC CCGCGGCTGC CAATGTCCGC AAGCCCGGCG GCAGCGGCCT 
TCTAGCGTCG CCCGCCGCCG CGCCTTCGCC GTCCTCGTCC TCGGTCTCCT CGTCCTCGGC 



Submitter Method ID: 

Single Nucleotide Polymorph!: 



in SCA2 Gene. 



Ue| PopulationID: 



Allele: C : 



Sampled : 
0.293 / T = 



Figure : 5 
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RafSNP Reconl 



tfbSNP Summag 

How To Submit 

Genome SNPRFA 
FAQ 

RefSNP Summary Info 

FTPSEHVER 
Database Sctwma 



Reference SNP Record 

NCBI SNP ID: rs695872 
NCBt Resource Links 

Gen Bank: U70323 
Locuslink: no link established 

Integrated Maps: undar construction 

Submitter records for this ID: 



Assay 10 Handle ] Local Submitter IO 
SS869705 FGU-C8T|SKB.2K.1 2 



Release Date 
Aug 2 2000 253PM 



Variation Summary: 

Assay sample size (number of chromosomes) : 430 
Population data sample size (number of chromosomes) : 
Total number of populations with frequency data: 1 
Total number of individuals with genotype data: 0 
Average estimated heterozygosity: 0.414 
Average Allele Frequency: 



Validation Summary: 



Marker displays Mendefan segregation: UNKNOWN 
PCR results confirmed in multiple reactions: YES 
Homozygotes detected in individual genotype data: UNKNOWN 
Insufficient genotype data to compute the goodness of fit to Hardy-Weinberg 
Insufficient data to compute individual x genotype consistency m 
Validation status: under constructor 
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SNP Publication Details 



flbSNP Summaiv 
Mot To. Submit 
Genome SNP RFA 

RgfSNP Stannary Infn 



TITLE: 

Single Nucleotide Polymorph!: 
AUTHOR: 

CHOODHHY.S. .- 3RAHMACKARI , S . K. 
JOURNAL: 
VOLUME: 



Database Schema 
Blast SNP 
Submission Form 



i PubMed by author: 



UkgJ to \lh L* Oubllcofclftn 
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SNP Population Details - 558 



SNPHnne 

tiw To Submit 

CV»rotTT>SrPRFA 
FAQ 

RsfSNPSiirmaiv Info 



Papulation ID: 



Subrntsston Form 



Ch"?rresoffig Report 



Fig.5(Cont . ) 
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SNP Method Details - 564 



abSNP Summary 
Hc,Vr To Submit 
Genome SNP RFA 

FAQ 

RefSNP Summary tnf 
FTP SERVER 

Database Sg^sn^a 

Blast SNP 
Submission Form 



: Method ID: 



(5' CTCCGCCTCAGACTGTTTTGGTAG : 
Approximately lOOng of genomic 
containing a final . 



iplifie 



i SO r 



i of 5mM Tris, 25mM KCl, a.75mM MgC12, 0.05* 
ind O.SU of Taq DNA polymerase. Samples were 
denatured at 91oC for 3 min followed by 35 cycles of denaturation <94oC. 
453ecl. annealing (52oC, JOsec) , extension (72oC, 45sec) and a final 
extension of 7 min at 72oC in a Perkin Elmer GeneAmp PCR System 9600. The 
PCR product was purified from band cut out of the agarose gel using QIAquick 
gel extraction kit (Qiagen) and was directly sequenced using dye terminator 
chemistry on an ABI Prism 377 automated DNA sequencer with the PCR primers. 



Main Search 
8v SuCmitter 
Mew Batches 
Method 
Pop* station 
Pt.-hly^ ion 

By <j*rje Name 

Chronrysome Report 
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Submtter 



SNPHcme 
dbSNP Summary 
HawTn Suhnwf 
GenomgSMPPFa, 
ESQ 

RgfSNP S lmra m Irfr, 




Submission Form 



Mam Search 
Bv Submitter 
Mew Batches 
MettTCtl 
Pncmlalion 

Chrnmaaomg P^pnrt 



Submitter Contact Details 



This batch's contact information: 
FGU-CBT 

Shweta Choudhry 
♦91-11-72S7471 
♦ 91-U-7416489 
a hwotachoudhrySho tma il . com 
Functional Genomics Unit 
Centra for Biochemical Technology (CSIR) 

Delhi Oniverity Campus. Mall Road. Delhi- 110007, India. 




Prof. Samir K. Brahmachari 
•-91-11-7257471 
♦91-11-741S489 
3kb9cbt.res.in 
Functional Genomics Unit 

Centra for Biochemical Technology (CSIR) , 

, ity Campu3- Mall Roadf Delhi- 110007, India. 
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aMudeotide 



Search [ Nucleotide ' $ \ for | 

Limits Index History i,npuu<nu 

| Display IfDefauIt . ! : vicw " f inML .t^iTH S** 5 1 | Add to Oiptx^rT] □ Hide B«f and u 



□ 1: GI = "1679683" (Gcafiuk] Human auxia-2 (SCA2) mRNA__ 



PubMeo". Prolein. Related Secuerces Taxonomy. CMIM. U.nkOut 



LOCUS HSU70323 4481 bp mRNA PR: 

DEFINITION Human a tax in- 2 (SOU) mRNA. cooplete cds. 

ACCESSION 070323 

VERSION O70323-1 GI: 1679683 

KEYWORDS 



20-HOV-1996 



ORGANISM 



ioTiO sa?iei 



Sukaryota; Ketazoa; Chorda ca: Craniata; Vertebfata; Eutelaostomi : 
Massialia; Eutharia: Primates; Catarrhini ; Hominidaa; Homo. 
1 (bases 1 Co 4481) 

Pulst.S. -M. . Nechiporux.A.. Nechiporuk.T. . Giapert.S.. Chen,X.-N.. 
Lopes-Cendes.I. . Pearlman . S . , Startaoan.S. , Orozco-Diaz.G. . 
Lunxes.A., DeJong.P.. Rouleau. G. A. . Auburger.G.. Korenberg, J.R. , 
Figueroa.C. and Sahba.S. 

Moderate expansion of a normally biallelic trinucleotide repeat in 
spinocerebellar ataxia type 2 
Mature Genet. 14 (3). 2S9-27S (1996) 

322. 



3ASE COUNT 



2 (bases 1 to 4481) 
Pulst.S. -M. 
Direct Submission 

Submitted (10-SSP-199S) Medicine. Cadara-Sinai. 8700 Beverly Blvd. . 
Los Angeles. CA 90048, USA 

Location/Qualifiers 
t 1..4481 

/organism- 'Homo sapiens* 

/db_xre£« • taxon : 9 SO 6 * 

/chromosome- -12' 

/map-*12q24.1' 

153.. 4101 

/gene-'SCA2- 

163.. 4101 

/gene«'SCA2' 

/standard_aaaie«' spinocerebellar ataxia type 2* 

/ codon_s tare « 1 

/product- *ataxin-2' 

/protein_id-*&isl2ilI-_L" 

/db_xref«*GIslS79684' 

/ translation- •KRSAAAAPRSPAVATESRRFAAARWPGWRSLQRPARRSCRGGGG 
AAPCPYPSAAPPPPCPGPPPSRQSSPPSASrxrFGSNGNCGGAFRPGSRRLLGLGGPPR 
PFVVVIiPIJ^PGAPPAAPTRASPIXakRASPPRSGVSIARPAPWPRPACBPVYGPLT 



SSSSATAWSWAATSGGGRPGI^RCRNSinfXILPQSTISrDGIYANHRltVHILTSVVG 

SRCEVD^/lOIGCIYBGVr-CrrSFKCDLVLDAAHEK^ 

VVVQFKDKOSSYAlOirMrrDSAISArVNCaoaOT 

AQYJCARVAI^NDDRSEEBKTTAVQRNSSBREGHS IWTRZ2*KTfZ PPGQRNRBVI SWCSG 

RQMSPRMGQPCSCSXPSRSTSOTSDrHPNSCSTXJRVVNGGVFWPSPCPSPSSRPPSRY 

QSGFNSLPPRAATPTRPPSRPPSRPSRPPSHPSAHGSPAPVST"iCPKRKSSBCPPRKSP 

KAQRKPRlOrRVSAGRCSISSCI^FVSIBlPPSBAATPPVARTSPSGGTWSSVVSGVPRI. 

SPKTHRPRSPRQNSICJm>SCPVIJkSPQAGIIPTKAVAI«PIPAASPTPASPASNRAVT 

PSSEAKDSRLQDQRQKSPACNKQIIKPNSTSPSFSKAKNKGISPWSRHRKQIDOLXX 

FTODPRLQPSSTSF^MIXjIXMKMRgGERSRflLICT*vIKPSA Rl»riE KSSSMCTSGSS 

KPNSPSISPSILSirraHXRCPKVTSOXr^SSPACTOElCOOKIZiaa^^ 

PHAJCEFNPRSFSQPKPSTTPTSPRPOACPSPSHVOTCXJPTP^^ 

?\YSTQW».YSP<3Cr^Pt.'^HVPHYQSQHPHVYSPVTOCNAJU«APPTHAOPGLVS 
SSATQYGAHBOTHAKYACPKLPTOKJrrSPSriTAISTGSIAQQYAHPNATLHPnTPHP 
QPSATPT<XXWSQHGGSKPAPSPVQKHQHQAAOAIJILASPQQQSAIYKAGLAPTPPSM 
TPAS^SPQNSrPAAOOT^IHPSHVQPAYTNPPKKAJT/PQAH^SGNVPSHPTA^ 

1144 a 1380 C 1014 g 943 t 



Figure : 6 
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acccccgaga 


aagcaaccca gcgcgccgcc 


cgctccccac 


gcgtcccccc 


cggccccggg 


gccacceeac 


gteccgceec cgcctgaccc 


ctccgaccec 


cggtaaagag 


cccccacccg 




cacceccgce 


cccacccggc gccccggcgc 


gcccgccccc 


cgacgcgccc 


agcggccgca 


gctccecgga 


gecccgcgge ggccaccgag 


ccecgccgcc 


Ccgccgcagc 


caggcggccc 




gggtggcgce 


cgccccagcg gccggcgcgg 
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ggggcggcgg 
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ccgcccggcc 


ccggcccccc 
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acggcaacgg 
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cccccggccg 
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cgcctccccg 




cecggcgccc 


gcgcgceccc gccgcgctcc 


ggcgtcccce 


cggcgcgccc 


ggctcccggc 


SOI 


tgtccccgcc 


cggcgtgcga gccggtgcae 


gggcccccca 


ccacgecgcc 


gaagccccag 


661 


cagcagcagc 


agcagcagca gcaacagcag 


cagcagcaac 


agcagcagca 


gcagcagcag 


721 


cagccgccgc 


ccgcggccgc caaegtccgc 


aagcccggcg 


gcagcggccc 


cceagcgtcg 


781 


cccgccgccg 


cgccCccgcc gcccCcgcce 


tcggccccct 


cgccctcggc 


cacggcCccc 


341 


ecctcggtgg 


tcgcggcgac ctccggcggc 


gggaggcccg 


gcccgggcag 


aggccgaaac 


901 


agCaacaaag 


gactgeetca gtceacgate 


tceettgaeg 


gaacccaegc 


aaaeacgagg 




aeggcecaea 


tacttacatc ageegteggc 


eccaaacgcg 


aagcacaagc 


gaaaaaCgga 


1021 


ggtatatacg 


aaggagcttt Caaaacttac 


agtccgaagc 


gegatttggt 


acctgacgcc 


1081 


gcacatgaga 


aaagtacaga atccagctcg 


gggccgaaac 


gcgaagaaat 


aaeggagagc 


1141 


acetegtcca 


aatgttcaga ctctgtegtg 


gcacagccca 


aagacacgga 


ccccagctat 


1201 


gcaaaaagag 


aegceectac cgacecegce 


accagcgcca 


aagcgaacgg 




1261 


gagaaggacc 


tggagccceg ggacgcaggc 


gaacccacag 


ccaacgagga 


acctgaggcc 


1321 


ttggaaaaeg 


acgtatcta* cggacgggac 


cccaatgaca 


tgttccgata 


taatgaagaa 


1381 


aaetaeggeg 


tagtgtctac gtaegaeagc 


ageeeaecet 


cgeacaeagt 


gccceeaga* 


1441 


agagaeaace 


cagaagaate Cttaaaacgg 


gaagcaaggg 


caaaceagtt 


agcagaagaa 


»JL501 


aecgagecaa 


gcgcccagta caaagcccga 


gcggccccgg 




caogagcgag 


UlS61 


gaagaaaaaC 


acacagcagc ecagagaaae 


Cccagcgaac 


gegaggggc* 


cagcacaaac 


ifHS21 


aceagggaaa 


aeaaacacae tcctcccgga 




gagaagccae 


a eccegggga 


ris8i 


agtgggagac 


agaattcacc gcgcaegggc 


cagcceggat 


cgggccccac 


gccatcaaga 


^1741 


Cccacetcec 


acaceccaga CCCcaacccg 


aaecceggee 


cagaccaaag 


agcagctaae 


f=H301 


ggaggegCCC 


ceeggccatc gccttgccca 


cccccecccc 


cccgeccacc 


ccctcflccac 


:"i8si 


cagccaggec 


ccaacecece eccaccecgg 


gcagccaccc 


ceacacggcc 


gccccccagg 


"11921 


ccccccccgc 


ggccaeccag acecccgeee 


eacccctctg 


cccacggccc 


cccagccccc 


;,fH9 81 


gectctacca 


Cgcctaaacg caegtcesca 


gaagggcctc 


eaaggatgtc 


cccaaaggcc 


/ : 2041 


cagcgacacc 


cecgaaaeca cagagececc 


gccgggaggg 


gceceacacc 


cagcggccca 


= a ~2l01 


gaatecgtae 


cceacaaccc acccagtgaa 


gcagccaccc 


ccceagcagc 


aaggaccagc 


,===2161 


ccctcggggg 


gaacgcggec atcageggec 


agcggggtcc 


caagaecaee 


ccctaaaaec 


—2221 


caUgaccca 


ggccecccag acagaacagc 


aceggaaaca 


eecccagegg 


gccagetccc 


s 2281 


gcCtcecccc 


aagctggrat taecccaace 


gaagccgeeg 


ecaegcctae 


cccagccgca 


.= = 2341 


ececceacgc 


ccgctagecc tgcatcgaac 


agagcegcca 


ccccCCCCag 


cgaggeeaaa 


5 "2401 


gaetccaggc 


ttcaagatca gaggcagaac 


tcecccgcag 


ggaacaaaga 


aaataccaaa 


M2461 




caccacetag ccccccaaaa 




aaggcacacc 




===2521 




gaaaacagat CgaegaCCCa 






Cacgccacag 


= 2581 


ccaagctcta 


ceectgaaee Caeggacca* 










Ccaagagacc 


CgaCcaaaga caaaacegaa 


ccaagcgcca 


aggaCtcete 




22701 


agcagcagca 


acegcaccag cggcagcagc 


aagccgaaca 


gccccagcac 


ccccccccca 


1=1 27 61 


aCaceeagea 


acaeggagca caagagggga 


cecgaggcca 


ccecccaagg 


ggctcagacc 


1=12821 


tccagcccag 


caegcaaaca agagaaagac 


gacaaggaag 


agaagaaaga 


cccagccgag 


"-2881 


caagccagga 




gcaaagg&ge 


ecaacccacg 


cccccecccc 


2941 


cagccaaagc 


ceeceacCac cccaacecea 


cetcggceCc 


aageaeaace 


tagcccatet 


3001 


acggegggec 


atcaacagce aaccccagcc 


Cacacccagc 


cegeecgccc 


Cgcaccaaac 


3061 


acgaegeaec 


cagecccage gagcecaggc 


gcgcaacece 


eacacccsac 


acccacgacg 


3121 


cccaegccag 


Cgaatcaagc caagacacac 


agagcageac 


caaacacgcc 


ccaacagcgg 


3181 


caagacc&gc 


accatcagag tgccatgacg 


cacccagcgc 


cagcageggg 


cccacegacc 


3241 


gcagccaccc 


eaccagccca ctceacgcaa 


tacgctgcct 


acagtcctca 


gcagccccca 


3301 




Cegetcagca egegccacae 




agcaccccca 


Cgcccacagc 


3361 










gcctggccca 


3421 


ccegeaaCac 
gcacccccee 


agggcaaegc cagaaCgaCg 
cagcaaceca gcacggggce 




cgca cgcgac 


gcacgcacgc 


3481 


cccaaaCCac 


caCacaacaa ggagacaagc 


cccccecccc 


acCCCgccat 


ecccacgggc 


3541 










cccacacccc 


3601 




ceacccccac tggacagcag 




acggcggaag 


ccacccegea 


3661 


cccagecceg 


Cecagcacca ecagcaccag 


gccgcccagg 


ccccccaccc 


ggccagCcca 


3721 


cagcagcagc 


cagccaceea ccacgcgggg 


CC Cgcgccaa 


ceecaccccc 


cacgacaccc 


3781 


gcctccaaca 


cgcagecgcc acagaacagc 


CCcccagcag 


eacaacagac 


tgcccccacg 


3341 


acccacccce 


cccacgccca gccggcgtac 




cccacacggc 


ccacgcacec 


3901 


caggcecaeg 


tacagecagg aacggCCcee 


CCCcacceaa 


ctgcccaegc 


gccaatgacg 


3961 


ccaacgacga 


cacagccacc cggcggeccc 


caggccgccc 


ccgctcaaag 


cgcaccacag 


4021 


cccaceccag 


Ccccgacaac agcgcaeeec 


ccceacacga 


cgcacccecc 




4081 


caccaccaac 


ageagcegCa aggccgccce 


ggaggaaccg 




cccccecccc 


4141 




ceeccaecaa ceggaagcac 




aaccccaccc 


acccegccec 


4201 


CaaaaCaCaC 


aegcegaece ctegcaaeae 


ccaacaggaa 


cgccaacage 


ccaceegcag 


4261 




ceggaccgag eagaggcaee 




ggggctaccc 




4321 


eacgcegece 


cagagtcccg caggcacccc 


agctccgcec 


gccgaaaccg 


gaagccaccc 


4381 




aaccceegaa agccatgaac 




gcaaaagaag 


Caacaagage 




gattcctgcc 


gceaCCaceg ctaaaaaaaa 
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SEQUENCE LISTING 

GENERAL INFO 
APPLICANT: CSIR 

TITLE OF INVENTION: Method of detection of human Spinocerebellar ataxia 2 gene 
variants. 

NO. OF SEQUENCES: 12 

CORRESPONDENCE ADDRESS: Center for Biochemical Technology, CSIR, Delhi 
University Campus, Mall Road, Delhi- 110007, India. 
Telephone: +91-11-7416489 Fax:+91-11-7257471 

INFORMATION FOR SEP ID NO: 1 
1. SEQUENCE CHARACTERISTICS 

1. LENGTH: 24 bp 

2. TYPE: DNA 

5' CTC CGC CTC AGA CTG TTT TGG TAG 3' 

2. ORGANISM: Artificial sequence 

3. IMMEDIATE SOURCE: Synthetic 

4. NAME/KEY: Synthetic Oligonucleotide 

5. SEQUENCE ID # 1 

INFORMATION FOR SEP ID NO: 2 
1. SEQUENCE CHARACTERISTICS 

1. LENGTH: 20 bp 

2. TYPE: DNA 

5' GTG GCC GAG GAC GAG GAG AC 3' 



23 



2. ORGANISM: Artificial sequence 

3. IMMEDIATE SOURCE: Synthetic 

4. NAME/KEY: Synthetic Oligonucleotide 

5. SEQUENCE ED #2 

INFORMATION FOR SEP ID NO: 3 
1. SEQUENCE CHARACTERISTICS 

1. LENGTH: 31 bp 

2. TYPE: DNA 

5' CTC GGC GGG CCT CCC CGC CCC TTC GTC GTC C 3' 

2. ORGANISM: Artificial sequence 

3. IMMEDIATE SOURCE: Synthetic 

4. NAME/KEY: Synthetic Oligonucleotide 

3. SEQUENCE ID #3 

INFORMATION FOR SEP ID NO: 4 

1. SEQUENCE CHARACTERISTICS 

4. LENGTH: 31 bp 

5. TYPE: DNA 

5' CTC GGC GGG CCT CCC CGC CCC TTC GTC GTC G 3' 

2. GRGANISM: Artificial sequence 

3. IMMEDIATE SGURCE: Synthetic 

4. NAME/KEY: Synthetic Pligonucleotide 
1. SEQUENCE ID #4 
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INFORMATION FOR SEP ID NO: 5 

1. SEQUENCE CHARACTERISTICS 

(A) LENGTH: 21 bp 

(B) TYPE: DNA 

5' CCT CCC CGC CCC TTC GTC GTC 3' 

2. ORGANISM: Artificial sequence 

3. IMMEDIATE SOURCE: Synthetic 

4. NAME/KEY: Synthetic Oligonucleotide 

5. SEQUENCE ID # 5 

INFORMATION FOR SEP ID NO: 6 

1. SEQUENCE CHARACTERISTICS 

(A) LENGTH: 32 bp 

(B) TYPE: DNA 

5' CGC CAA CCC GCG CCT CCC CGC TCG GCG CCC GC 3' 

2. GRGANISM: Artificial sequence 

3. IMMEDIATE SOURCE: Synthetic 

4. NAME/KEY: Synthetic Oligonucleotide 

5. SEQUENCE ID #6 

INFORMATION FOR SEP ID NO: 7 
1. SEQUENCE CHARACTERISTICS 

(C) LENGTH: 32 bp 

(D) TYPE: DNA 

5' CGC CAA CCC GCG CCT CCC CGC TCG GCG CCC GT V 
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2. ORGANISM: Artificial sequence 

3. IMMEDIATE SOURCE: Synthetic 

4. NAME/KEY: Synthetic Oligonucleotide 

5. SEQUENCE ID #7 

INFORMATION FOR SEP ID NO: 8 

1. SEQUENCE CHARACTERISTICS 

(A) LENGTH: 22 bp 

(B) TYPE: DNA 

5' GCG CCT CCC CGC TCG GCG CCC G 3' 

2. ORGANISM: Artificial sequence 

3. IMMEDIATE SOURCE: Synthetic 

4. NAME/KEY: Synthetic Oligonucleotide 

5. SEQUENCE ID #8 

INFORMATION FOR SEP ID NO: 9 

1. SEQUENCE CHARACTERISTICS 

(A) LENGTH: 27 bp 

(B) TYPE: DNA 

5' CCC CTT CGT CGT CCT CCT TCT CCC CCT 3' 

2. GRGANISM: Artificial sequence 

3. IMMEDIATE SOURCE: Synthetic 

4. NAME/KEY: Synthetic Oligonucleotide 

5. SEQUENCE ID # 9 
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INFORMATION FOR SEP ID NO: 10 

1. SEQUENCE CHARACTERISTICS 

(C) LENGTH: 27 bp 

(D) TYPE: DNA 

5' CCC CTT CGT CGT CGT CCT TCT CCC CCT 3' 

2. ORGANISM: Artificial sequence 

3. IMMEDIATE SOURCE: Synthetic 

4. NAME/KEY: Synthetic Oligonucleotide 

5. SEQUENCE ID # 10 

INFORMATION FOR SEP ID NO: 1 1 

1. SEQUENCE CHARACTERISTICS 

(A) LENGTH: 27 bp 

(B) TYPE: DNA 

5' CGC TCG GCG CCC GCG CGT CCC CGC CGC 3' 

2. ORGANISM: Artificial sequence 

3. IMMEDIATE SOURCE: Synthetic 

4. NAME/KEY: Synthetic Oligonucleotide 

5. SEQUENCE ID # 11 

INFORMATION FOR SEP ID NO: 12 
1. SEQUENCE CHARACTERISTICS 

(C) LENGTH: 27 bp 

(D) TYPE: DNA 

5' CGC TCG GCG CCC GTG CGT CCC CGC CGC 3' 
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2. ORGANISM: Artificial sequence 

3. IMMEDIATE SOURCE: Synthetic 

4. NAME/KEY: Synthetic Oligonucleotide 
5. SEQUENCE ID # 12 
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